NAVAL POSTGRADUATE SCHOOL 
Monterey, California 




THESIS 



A COMPARISON OF THREE NUMERICAL 
METHODS FOR 
UPDATING REGRESSIONS 

by 

Grigorios J. Raptis 
September, 1991 

Thesis Advisor: Dan C. Boger 

Thesis Co-advisor: William B. Gragg 



Approved for public release; distribution is unlimited. 



T 26068 1 



SECURITY CLASSIFICATION OF THIS PAGE 



la REPORT SECURITY CLASSIFICATION 
unclassified 


1b RESTRICTIVE MARKINGS 


2a SECURITY CLASSIFICATION AUTHORITY 


3 DISTRIBUTION/AVAILABILITY OF REPORT 
Approved for public release; distribution is unlimited. 


2b DECLASSIFICATION/DOWNGRADING SCHEDULE 


4. PERFORMING ORGANIZATION REPORT NUMBER(S) 


5 MONITORING ORGANIZATION REPORT NUMBER(S) 


6a NAME OF PERFORMING ORGANIZATION 
Naval Postgraduate School 


6b OFFICE SYMBOL 
(If applicable) 
OR 


la NAME OF MONITORING ORGANIZATION 
Naval Postgraduate School 


6c ADDRESS (C/ty, State, andZIPCode) 
Monterey, CA 93943-5000 


7b ADDRESS {City, State, and ZIP Code) 
Monterey, CA 93943-5001) 


8a NAME OF FUNDING/SPONSORING 
ORGANIZATION 


8b OFFICE SYM80L 
(If applicable) 


9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 


8c ADDRESS {City, State, and ZIP Code) 


10 SOURCE OF FUNDING NUMBERS 



REPORT DOCUMENTATION PAGE 



Program tlemem No 



Project No 



I 



Work Una Accei^on 
Number 



1 1 TITLE (Include Security Classification ) 

A COMPARISON OP THREE NUMERICAL METHODS FOR UPDATING REGRESSIONS 



12 PERSONAL AUTHOR(S) 



Grigorios J. Raptis 



13d TYPE OF REPORT 
Master’s Thesis 



| 13b TIME COVERED 
I From To 



14 DATE OF REPORT (year, month, day) 
1991 , September 



15 PAGE COUNT 

95 



16. SUPPLEMENTARY NOTATION 

The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. 
Government. 



17. COSATI CODES 


FIELD 


GROUP 


SUBGROUP 















1 8 SUBJECT TERMS (continue on reverse if necessary and identify by block number) 

Hyperbolic Rotation, Gram Schmidt Factorization, Reorthogonalization, Column 
Permutation, Updating 



19 ABSTRACT (continue on reverse if necessary and identify by block number) 

Three numerical procedures are presented for updating regressions. All three methods are based on QR factorization, but alter that they use 
different philosophies to update the regression coefficients. Elden’s algorithm updates using only the triangular matrix R. This procedure does 
not use orthogonal transformations, but it uses hyperbolic rotations. The modified Gram-Schmidt QR process is used by Gragg-Leveque- 
Trangenstein’s method where the matrix with orthonormal columns is stored and updated. Chan’s algorithm computes a column permutation TT 
and a QR factorization of a matrix A such that a rank deficiency of A will be revealed. Although the three methods are based on different ideas 
and can be used for different purposes their comparison shows that Chan’s algorithm is the only accurate one in the rank deficient case, and 
that Gragg-Leveque-Trangenstein’s method is the cheapest and the most stable. 



20 DISTRIBUTION/AVAILABILITY OF ABSTRACT 

PI UNCLASSIHED/UNLIMITED fH SAME AS REPORT PI OTIC USERS 



21 ABSTRACT SECURITY CLASSIFICATION 
Unclassified 



22a NAME OF RESPONSIBLE INDIVIDUAL 
Dan C. Roger 



22b TELEPHONE (Include Area code) 
(408)646-2607 



22c OFFICE SYMBOL 
OK/Bo 



DD FORM 1473, &4 MAR 



83 APR edition may be used until exhausted 
All other editions are obsolete 



SECURITY CLASSlfiCAT ION OF THIS PAGE 

Unclassified 



1 



Approved for public release; distribution is unlimited. 



A Comparison of Three Numerical 
Methods for 
Updating Regressions 

by 

Grigorios J. Raptis 

Lieutenant Commander, Hellenic Navy 
B.S., Technical University of Athens 
B.S., Hellenic Naval Academy 

Submitted in partial fulfillment 
of the requirements for the degree of 

MASTER OF SCIENCE IN OPERATIONS RESEARCH 

from the 

NAVAL POSTGRADUATE SCHOOL 
September J/991 ^ 



ABSTRACT 



Three numerical procedures are presented for updating regressions. All three 
methods are based on QR factorization, but after that they use different philosophies 
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triangular matrix R. This procedure does not use orthogonal transformations, but it 
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factorization of a matrix A such that a rank deficiency of A will be revealed. Although 
the three methods are based on different ideas and can be used for different purposes 
their comparison shows that Chan’s algorithm is the only accurate one in the rank 
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I. 



INTRODUCTION 



A. BACKGROUND 

Regression analysis provides a variety of modeling 
techniques that allows the analyst to relate a dependent 
variable to one or more independent variables. The 
mathematical complexity of the model and the degree to which 
it is a realistic model will depend on how much is known about 
the process being studied and on the purpose of the modeling 
exercise. In preliminary studies of a process or in cases 
where prediction is the primary objective, the models will 
frequently fall in the class of models that are linear in the 
parameters. That is, the parameters enter the model as simple 
coefficients of the independent variables or functions of the 
independent variables. Such models will be referred to loosely 
as linear models. In many statistical problems, it is useful 
to express the dependent variable as a linear function of the 
independent variables. Furthermore, regression analysis can 
summarize data and quantify the nature and strength of the 
relationships among variables. Regression analysis can also be 
used to predict new values of the dependent variables based on 
observed relationships. 
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B. ANALYSIS OF REGRESSION 



As the reader will see in the following chapter, the main 
purpose of this thesis is to analyze the efficiency and 
numerical stability of some procedures for updating 
regressions. Most regression problems, however, require 
decisions on which variables to include in the model, the form 
the variables should take, for example, X, X 2 , 1/X, etc. , and 
the functional form of the model. It is assumed that there is 
a set of t candidate variables, which presumably includes all 
relevant variables, from which a subset of r variables is to 
be chosen for the regression equation. The candidate variables 
may include different forms of the same basic variable, such 
as X and X 2 , and the selection process may include constraints 
on which variables are to be included. For example, X may be 
forced into the model if X 2 is in the selected subset. This is 
a common constraint in building polynomial models. 

There are three distinct problem areas related to this 
general topic: 

■ The theoretical effects of variable selection on the 
least squares regression results. 

■ The computational methods for finding the "best" subset 
of variables for each subset size. 

■ The choice of subset size (for the final model), or the 
"stopping rule". 
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This thesis mainly discusses the second problem area of 
finding the more efficient and numerically stable 
computational methods. Also, we generally discuss the criteria 
for the "best" subset size choice, in other words the criteria 
for the "stopping rules". 

The simplest linear model involves only one independent 
variable and states that the true mean of the dependent 
variable changes at a constant rate as the value of the 
independent variable increases or decreases. Thus, the 
functional relationship between the true mean of Y x , E(Yi) , and 
X A is the equation of a straight line, 

EO^) =p 0 + p i (X i ) , 

where fi 0 is the intercept, or the value of E(Y t ) when X = 0, 
and is the slope of the line, or the rate of change in E (YJ 
per unit change in X. The constants /3 0 and fi x are population 
parameters which are to be estimated from the sample of 
observations . 

The observations of the dependent variables, Y it are 
assumed to be random observations from populations of random 
variables with the mean of each population given by E(Yi) . The 
deviation of an observation Y x from its population mean E(Yi) 
is taken into account by adding a random error, € Xl to give 
the statistical model 
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r^Po+Pi^+e,. 



( 1 . 1 ) 



The subscript i indicates the particular observation unit, i 
= l,2,...,n. The X i are the n observations of the independent 
variable and are assumed to be measured without error; that 
is, the observed values of X are assumed to be a set of known 
constants. The Y L and are paired observations; both are 
measured on every observational unit. The random errors 
are assumed to be normally, independently and identically 
distributed: [Ref. 1] 

e r NID{0, a 7 ) . 



1. The Matrix Model 

Given the above regression model, we can write it in 
matrix form as Y = X6 + e , where Y is the n x 1 column vector 
of observations of the dependent variable. The n x p matrix X, 
where x 10 = x 20 = ... = x n0 = 1, will be called the regression 
matrix, and the x^'s are generally chosen so that the column 
of X are linearly independent; that is X has rank p. However, 
in some experimental design situations the elements of X are 
chosen to be 0 or 1, and the columns of X may be linearly 
dependent. In this case X is commonly called the design 
matrix. Also, /J is the p x 1 vector of parameters to be 
estimated; and e is the n x 1 vector of random errors. 
Writing out the components of Y = X/? + e yields 
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( 1 . 2 ) 
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The elements of a particular row, r, of X are the 
coefficients of the corresponding parameters in /3 which give 
E(Y r ) . The vectors Y and € are random vectors? the elements of 
these vectors are random variables. The matrix X is considered 
to be a matrix of known constants. The vector |5 is a vector of 
unknown constants and is to be estimated from the given data. 

Using the above assumptions on e if the random vector 
e has a multivariate normal distribution with a mean of the 
zero (0) vector. The variance-covariance matrix for any random 
vector of n elements is defined as an n x n symmetric matrix 
with the diagonal elements equal to the variance of e L and the 
off-diagonal elements equal to the covariance between and 
£j. Let Z be a nxl vector of random variables z lt z 2 , z 3 , ... 
,z n ; the variance-covariance matrix of Z is the following 
n x n matrix. 
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Var ( Z) 



' var{z 2 ) cov(z ir z 2 ) 
cov(z 2 ,z 1 ) var(z 2 ) 



■ • ■ 



cov{z^,z n y 

cov{z 2 ,z n ) 



(1.3) 



K cov{z a ,z 2 ) cov(z n ,z 2 ) ... var(z n ) ; 

The variance-covariance matrix of e is la 2 , and since 
e is independent and identically distributed, the distribution 
of e is written in shorthand notation as 

e~N(0, la 2 ) , 

where I is the n x n identity matrix and a 2 is the common 
variance of all € 1 . Furthermore, since the elements of X and 
/? are constants, the Xj3 term in the model is a set of 
constants being added to the vector of random errors, e. Thus, 
Y is a random vector with mean vector X/J and variance- 
covariance matrix la 2 : 

E(Y) =E(xp+e) =^( JTp) +E(e) =Xp (1.4) 

Vaz(Y) =Var(Xp+e) =Var(z) =Jo 2 (1.5) 

Var(Y) is the same as Var ( € ) since adding a constant to a 
random variable does not change the variance. When e is 
multivariate normally distributed, Y is also multivariate 
normally distributed. Thus, 
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Y~N(X$, To 2 ) (1*6) 

This result is based on the assumption that the linear model 



being used is the correct model [Ref. 2:. 68]. 



2. The Normal Equations 

Before considering the problem of estimating /? , we 
note that we are referring to the model (2), where X i0 is not 
necessarily constrained to be unity. In the case when X i0 f 1 
we have to use a notation in which i runs from 0 to p-1 rather 
than 1 to p. However, since the major application of the 
theory is to the case X i0 = 1, it is convenient to separate 0 O 
from the other ' s right from the outset. 

The oldest method of obtaining an estimate of fi is the 
method of least squares. It dates back at least to Gauss and 
consists of minimizing w ith respect to /3 (see Fig. 1); 

that is, setting q = X/3 , we minimize e 1 e 1 = || Y - q|| 2 subject 
to qe!R[X] = W, where W is the range space of X {y:y=Xx for 
some x}. If we let q vary in W, || Y - q|| 2 (the square of the 
length of Y - q) will be a minimum for q = q* when (Y - q*) is 
orthogonal with W (see Fig. 1) . 

Thus 



X t (Y - q*) = 0 



(1.7) 



or 



1 We denote the transpose of a vector y by y T , and 

similarly for a matrix transpose. 
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Figure 1. The method of least squares consists of finding 
A such that AB is a minimum. 



X T g* = X t Y. (1-8) 

Here q* is uniquely determined, being the unique orthogonal 
projection of Y onto W. Now given that the columns of X are 
linearly independent, there exists a unique vector (0) such 
that q* = X0. Therefore substituting in (1.8) we have 

X T X p = X T Y (1.9) 

the so-called normal equation(s) . At this point we assume that 
X has rank p, so X T X is positive definite and therefore 
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nonsingular. If so, equation (1.9) has a unique solution, 
namely, 

0 = ( X T X) - 1 X T Y ( 1 - 10 ) 

Here P is called the ordinary least squares estimate of /} . 
Computational methods for calculating P are given in the 
following chapters. Notice that P can also be obtained by 
writing 

c r e = (r-Xp) T (Y-X$) = Y T Y-2P T X T Y+P T X T XP 

using the fact that /J T X T Y = (/3 T X T Y) T = Y T X/J , and differentiating 
e 7 e with respect to . Thus from de^e/d/} = 0 we have 

-2X T Y+2X T X& = 0 
or 

x T xp = x t y. 

This solution for ft gives us a stationary value of e T e , and 
a simple algebraic identity confirms that P is a minimum 
[Ref. 2]. 

The multiplication X T X generates a p 1 x p 1 matrix 
where the diagonal elements are the sums of squares of each of 
the independent variables and the off-diagonal elements are 
the sums of products between independent variables. The 
general form is 
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Summation in all cases is over i = 1 to n, the n observations 
in the data. When only one independent variable is involved, 
X T X consists of only the upper-left 2x2 matrix. Inspection 
of the normal equations will reveal that the elements in this 
2x2 matrix are the coefficients of $ 0 and . 

The elements of the matrix product X T Y are the sums of 
products between each independent variable in turn and the 
dependent variable: 



X t Y 



E v, 

E-w, 



9 9 9 

.E-Vi. 



( 1 . 12 ) 



The first element, EY if is the sum of products between the 
vectors of ones (the first column of X) and Y. As is mentioned 
above, if only one independent variable is involved, X T Y 
consists of only the first two elements. 
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The fitted regression 



y = zb. 



is denoted by 



Y ( = [(Y^)]) , and the elements of 

e = Y-Y = Y-X$ = (I D ~X(X T X) - 1 X T ) Y = ( I D ~P) Y , say, 

are called the residuals . The minimum value of 6 T £ , namely, 

e T e = (r-xfl) r (r-xfJ) 

= Y t Y-2$ t X t Y+$ t X t X$ 



= Y t Y-$ t X t Y+$ t [X t X$-X t Y ] 



= Y t Y-$ t X t Y = Y T Y-$ T X T X$ , (1.13) 

is called the residual sum of squares (RSS) . Notice that Y and 
e are uniquely defined by (5 . 

C. VARIABLE SELECTION IN LEAST SQUARES ANALYSIS 

The purpose of the least squares analysis will influence 
the manner in which the model is constructed. Hocking [Ref. 4] 
relates six potential uses of regression equations given by 
Mallows [Ref. 3]: 

■ Providing a good description of the behavior of the 
response variable 

■ Prediction of future responses and estimation of mean 
responses 
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■ Extrapolation, or prediction of responses outside the 
range of the data 

■ Estimation of parameters 

■ Control of a process by varying levels of input 

■ Developing realistic models of the process. 

Assume that the correct model involves t independent 
variables but that a subset of p variables, chosen randomly or 
on the basis of external information, is used in the 
regression equation. Let X p and /} p denote submatrices of X and 

/5 that relate to the p selected variables. 0 p will denote the 

least squares estimates of /J p obtained from the p-variate 
subset model and MS(Res p ) the mean squares residual obtained 
from the p-variate subset model. After the above the following 
properties are summarized: 

■ MS(Res p ) is a positively biased estimate of a 2 unless the 
true regression coefficients for all deleted variables are 
zero. 

■ 0 p is a biased estimate of )S p and less variable than the 

corresponding statistics obtained from the t-variate model 
[Ref. 4]. 

Stepwise regression methods are variable selection methods 
which identify good (although not necessarily the best) subset 
models, with considerably less computing than required for all 
possible regressions. The subset models are identified 
sequentially by adding or deleting, depending on the method. 
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the one variable that has the greatest impact on the residual 
sum of squares. These stepwise methods are not guaranteed to 
find the "best" subset for each subset size, and the results 
produced by different methods may not agree with each other. 

D. RESEARCH METHODOLOGY 

Given the regression model Y = X/J + e , where X is n x p . 
a number of computational techniques have being suggested for 
solving the following steps: 

■ Solve the normal equations X T X = X T Y . 

■ Calculate the residual e = r-**. 

■ Calculate the residual sum of squares RSS = e T e. 

■ Update the regression model (that is, add or remove a 
row of X) . 

■ Add or remove a regressor (that is, add or remove a 
column of X. ) 

■ Calculate an F-statistic for a general linear 
hypothesis . 

Solving the above steps is a common problem in a computer 
laboratory. These problems arise in a variety of areas and in 
a variety of contexts. Linear least squares problems are 
particularly difficult to solve because they frequently 
involve large quantities of data, and they are frequently ill- 
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conditioned 2 . The research methodology will be to describe 
acceptable procedures for updating regressions. The procedures 
to be chosen should be economically attractive and numerically 
accurate, in the case that the number of observations in the 
regression is large compared to the number of variables in the 
regression model. Each of these procedures (algorithms) has 
advantages and disadvantages which will be explained below. 



E. THESIS ORGANIZATION 
1 . General 

The main purpose of this thesis is to bring to the 
attention of readers numerically acceptable procedures for 
adding and deleting rows and columns from a regression model. 
For the case of a single row to be inserted or deleted, the 
algorithms use simple techniques: plane or hyperbolic 

rotations and the Gram-Schmidt process. The algorithms for 
inserting rows are stable, but the problem of deleting a row 
may be ill-conditioned and some algorithms for this process 
may be numerically unstable. 

The need for updating regression results arises for 
various statistical or numerical reasons. When data are 



2 A set of linear equations Bx = c is said to be ill- 
conditioned if small errors or variations in the elements of 
B and c have a large effect on the solution x. 



14 



arriving sequentially, it may be undesirable or impossible to 
wait for all the data before obtaining some regression 
results. In various time-series problems, one is interested in 
the changing relationships between variables. A regression 
model with a fixed number of lagged terms, as it moves over a 
series of data, generates a "window" on the sample, with a new 
observation added, and an old one deleted 3 , as the window 
moves to the next point in the series [Ref. 5], 

2. Basic Schedule 

The procedures for updating regressions that we shall 
discuss here are the following: 

■ An algorithm based on the normal equations 

(Efroymson M. A., 1960[Ref. 6], Draper N. and Smith 

H. , 1966 [Ref. 7]); this had been the standard 

introductory approach in regression courses. In this algorithm 
the regression coefficients are solved using Gauss-Jordan 
elimination on the normal equations. However, the problem of 
solving the normal equations is frequently much more ill- 
conditioned than the original problem of solving the 
overdetermined linear equations. 

■ Stepwise regression analysis with orthogonal 

transformations (Lars Elden 1972) [Ref. 8]. In this 



3 But in this case the problem can be ill-conditioned, 
because the rank can be decreased by this operation. 
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